INDEX
Explanations
the abbreviation "OT" with increasing levels of intensity, reaching its highest activation with "OT" at 10
instances of the token "OT," indicating a focus on out-of-target content or specific categories in a dataset
New Auto-Interp
Negative Logits
bler
-0.72
uckland
-0.68
fixme
-0.68
ãĥ¼ãĥĨ
-0.67
plain
-0.62
mM
-0.62
fold
-0.61
nature
-0.61
antha
-0.61
versa
-0.60
POSITIVE LOGITS
assium
1.17
TL
1.02
OGR
1.00
ECH
0.97
atoes
0.89
ION
0.88
TE
0.88
TO
0.87
OT
0.86
YP
0.85
Activations Density 0.015%