INDEX
Explanations
phrases that discuss perspectives and perceptions of the world and social issues
New Auto-Interp
Negative Logits
onta
-0.16
assin
-0.15
aku
-0.14
avra
-0.14
onto
-0.14
ippy
-0.14
913
-0.14
ATCH
-0.13
wheel
-0.13
temp
-0.13
POSITIVE LOGITS
differently
0.33
through
0.28
Through
0.26
through
0.25
Through
0.25
THROUGH
0.22
thru
0.21
través
0.19
af
0.19
_through
0.19
Activations Density 0.144%