INDEX
Explanations
web links or URLs in a specific format
numerical data or references to significant statistics
New Auto-Interp
Negative Logits
fencing
-0.74
istry
-0.68
Kass
-0.66
cart
-0.65
wagon
-0.62
Zucker
-0.62
Lama
-0.61
Zo
-0.61
vel
-0.61
Ging
-0.61
POSITIVE LOGITS
120
0.95
0000000
0.92
125
0.86
114
0.86
157
0.86
145
0.85
147
0.85
148
0.85
118
0.85
146
0.85
Activations Density 0.145%