INDEX
Explanations
initials of certain entities or brands followed by a numerical value, likely referring to specific terms or entities
abbreviations or acronyms related to specific entities or concepts
New Auto-Interp
Negative Logits
wich
-0.89
anism
-0.82
tics
-0.81
sburgh
-0.80
eous
-0.77
ertodd
-0.76
lies
-0.76
iform
-0.76
andowski
-0.73
Compat
-0.69
POSITIVE LOGITS
SS
0.93
BF
0.91
GI
0.90
GS
0.89
FS
0.87
PM
0.86
BB
0.86
BC
0.86
RR
0.85
WD
0.84
Activations Density 0.037%