INDEX
Explanations
references to debates and discussions on various topics
New Auto-Interp
Negative Logits
jar
-0.54
pit
-0.51
rob
-0.51
NIL
-0.50
circ
-0.50
deg
-0.50
dart
-0.50
tid
-0.50
NIL
-0.49
ars
-0.49
POSITIVE LOGITS
enfans
0.87
Monfieur
0.84
desmotivaciones
0.81
Houſe
0.81
navideño
0.79
navideños
0.77
myſelf
0.75
houſe
0.75
näytte
0.73
Anſ
0.73
Activations Density 0.248%