INDEX
Explanations
text related to a specific topic or subject
references to a specific topic or subject
New Auto-Interp
Negative Logits
ERROR
-0.71
Zimmer
-0.69
fortune
-0.67
aukee
-0.66
Ka
-0.66
ATES
-0.66
GBT
-0.63
Cop
-0.63
ereo
-0.63
IELD
-0.62
POSITIVE LOGITS
matter
1.17
ivity
1.15
ively
1.10
ivist
1.02
ivities
1.02
itatively
0.95
ivism
0.93
userc
0.93
icals
0.91
matter
0.90
Activations Density 0.015%